该类在java的java.util包中。
基本原理
BitSet是位操作的对象,值只有0或1即false和true,内部维护了一个long数组,初始只有一个long,所以BitSet最小的size是64,当随着存储的元素越来越多,BitSet内部会动态扩充,最终内部是由N个long来存储,这些针对操作都是透明的。
用1位来表示一个数据是否出现过,0为没有出现过,1表示出现过。使用用的时候既可根据某一个是否为0表示,此数是否出现过。
一个1G的空间,有 8*1024*1024*1024=8.58*10^9bit,也就是可以表示85亿个不同的数。
该类的成员变量
/*
* BitSets are packed into arrays of "words." Currently a word is
* a long, which consists of 64 bits, requiring 6 address bits.
* The choice of word size is determined purely by performance concerns.
*/
private final static int ADDRESS_BITS_PER_WORD = 6;
private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;
/* Used to shift left or right for a partial word mask */
private static final long WORD_MASK = 0xffffffffffffffffL;
/**
* @serialField bits long[]
*
* The bits in this BitSet. The ith bit is stored in bits[i/64] at
* bit position i % 64 (where bit position 0 refers to the least
* significant bit and 63 refers to the most significant bit).
*/
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("bits", long[].class),
};
/**
* The internal field corresponding to the serialField "bits".
*/
private long[] words;
/**
* The number of words in the logical size of this BitSet.
*/
private transient int wordsInUse = 0;
/**
* Whether the size of "words" is user-specified. If so, we assume
* the user knows what he's doing and try harder to preserve it.
*/
private transient boolean sizeIsSticky = false;
/* use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = 7997698588986878753L;
wordIndex函数,代码如下
/**
* Given a bit index, return word index containing it.
*/
private static int wordIndex(int bitIndex) {
return bitIndex >> ADDRESS_BITS_PER_WORD;
}
这个函数用来获取某个数在words数组中的索引,通过bitIndex右移ADDRESS_BITS_PER_WORD位,即右移6位。
正常情况下,右移n位得到的值为原值除以2的n次方,而2的6次方是64,也就是bitIndex/64,words数组中每个元素能够代表64个数字,所以存储的值除以64来得到数组索引值,例如0到63的索引值为0,那么这64个数字划归到第一个数组元素中,64到127划归到第二个数组元素中,以此类推。
checkInvariants()函数,代码如下
重要的内部检查函数。
/**
* Every public method must preserve these invariants.
*/
private void checkInvariants() {
assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
assert(wordsInUse >= 0 && wordsInUse <= words.length);
assert(wordsInUse == words.length || words[wordsInUse] == 0);
}
这个函数用来验证当前对象的有效性。
recalculateWordsInUse()函数,代码如下
重要的内部检查函数。
/**
* Sets the field wordsInUse to the logical size in words of the bit set.
* WARNING:This method assumes that the number of words actually in use is
* less than or equal to the current value of wordsInUse!
*/
private void recalculateWordsInUse() {
// Traverse the bitset until a used word is found
int i;
for (i = wordsInUse-1; i >= 0; i--)
if (words[i] != 0)
break;
wordsInUse = i+1; // The new logical size
}
这个函数用来计算当前已经使用了的最大数组元素个数。
初始化函数
/**
* Creates a new bit set. All bits are initially {@code false}.
*/
public BitSet() {
//创建一个words数组,大小为1
initWords(BITS_PER_WORD);
//该数组大小不限定
sizeIsSticky = false;
}
/**
* Creates a bit set whose initial size is large enough to explicitly
* represent bits with indices in the range {@code 0} through
* {@code nbits-1}. All bits are initially {@code false}.
*
* @param nbits the initial size of the bit set
* @throws NegativeArraySizeException if the specified initial size
* is negative
*/
public BitSet(int nbits) {
// nbits can't be negative; size 0 is OK
if (nbits < 0)
throw new NegativeArraySizeException("nbits < 0: " + nbits);
initWords(nbits);
//该数组大小限定
sizeIsSticky = true;
}
/**
* Creates a bit set using words as the internal representation.
* The last word (if there is one) must be non-zero.
*/
private BitSet(long[] words) {
//将该数组赋值给BitSet对象中的words数组
this.words = words;
//将当前数组正在使用的最大数组索引设置为数组元素大小
this.wordsInUse = words.length;
//验证当前对象的合法性
checkInvariants();
}
两个构造函数,分别是一个指定了初始大小,一个没指定。如果没指定,我们可以看到默认的初始大小为, 2^6 = 64-1=63 bit. 我们知道java中long的大小就是8个字节,也就是8*8=64bit。也就是说,bitset默认的是一个long整形的大小。初始化函数指定了必要的大小。
注:如果指定了bitset的初始化大小,那么会把它规整到一个大于或者等于这个数字的64的整倍数。比如64位,bitset的大小是1个long,而65位时,bitset大小是2个long,即128位。做这么一个规定,主要是为了内存对齐,同时避免考虑到不要处理特殊情况,简化程序。
initWords函数,代码如下
private void initWords(int nbits) {
words = new long[wordIndex(nbits-1) + 1];
}
这个函数用来创建数组words,该数组元素大小通过wordIndex(nbits-1)+1来获取,而wordIndex上面我们讲了,用来得到某个值所在words数组的索引,然后创建words数组的时候一定要保证数组个数大于等于该值所在的索引值。
valueof函数,代码如下
/**
* Returns a new bit set containing all the bits in the given long array.
*
* <p>More precisely,
* <br>{@code BitSet.valueOf(longs).get(n) == ((longs[n/64] & (1L<<(n%64))) != 0)}
* <br>for all {@code n < 64 * longs.length}.
*
* <p>This method is equivalent to
* {@code BitSet.valueOf(LongBuffer.wrap(longs))}.
*
* @param longs a long array containing a little-endian representation
* of a sequence of bits to be used as the initial bits of the
* new bit set
* @return a {@code BitSet} containing all the bits in the long array
* @since 1.7
*/
public static BitSet valueOf(long[] longs) {
int n;
//倒序从longs数组中获取第一个不为0的元素所在的索引值
for (n = longs.length; n > 0 && longs[n - 1] == 0; n--)
;
//从longs数组中拷贝索引从0到n的元素,放到新创建的数组中,然后将该新数组构造BitSet类对象
return new BitSet(Arrays.copyOf(longs, n));
}
/**
* Returns a new bit set containing all the bits in the given long
* buffer between its position and limit.
*
* <p>More precisely,
* <br>{@code BitSet.valueOf(lb).get(n) == ((lb.get(lb.position()+n/64) & (1L<<(n%64))) != 0)}
* <br>for all {@code n < 64 * lb.remaining()}.
*
* <p>The long buffer is not modified by this method, and no
* reference to the buffer is retained by the bit set.
*
* @param lb a long buffer containing a little-endian representation
* of a sequence of bits between its position and limit, to be
* used as the initial bits of the new bit set
* @return a {@code BitSet} containing all the bits in the buffer in the
* specified range
* @since 1.7
*/
public static BitSet valueOf(LongBuffer lb) {
lb = lb.slice();
int n;
for (n = lb.remaining(); n > 0 && lb.get(n - 1) == 0; n--)
;
long[] words = new long[n];
lb.get(words);
return new BitSet(words);
}
/**
* Returns a new bit set containing all the bits in the given byte array.
*
* <p>More precisely,
* <br>{@code BitSet.valueOf(bytes).get(n) == ((bytes[n/8] & (1<<(n%8))) != 0)}
* <br>for all {@code n < 8 * bytes.length}.
*
* <p>This method is equivalent to
* {@code BitSet.valueOf(ByteBuffer.wrap(bytes))}.
*
* @param bytes a byte array containing a little-endian
* representation of a sequence of bits to be used as the
* initial bits of the new bit set
* @return a {@code BitSet} containing all the bits in the byte array
* @since 1.7
*/
public static BitSet valueOf(byte[] bytes) {
return BitSet.valueOf(ByteBuffer.wrap(bytes));
}
/**
* Returns a new bit set containing all the bits in the given byte
* buffer between its position and limit.
*
* <p>More precisely,
* <br>{@code BitSet.valueOf(bb).get(n) == ((bb.get(bb.position()+n/8) & (1<<(n%8))) != 0)}
* <br>for all {@code n < 8 * bb.remaining()}.
*
* <p>The byte buffer is not modified by this method, and no
* reference to the buffer is retained by the bit set.
*
* @param bb a byte buffer containing a little-endian representation
* of a sequence of bits between its position and limit, to be
* used as the initial bits of the new bit set
* @return a {@code BitSet} containing all the bits in the buffer in the
* specified range
* @since 1.7
*/
public static BitSet valueOf(ByteBuffer bb) {
bb = bb.slice().order(ByteOrder.LITTLE_ENDIAN);
int n;
for (n = bb.remaining(); n > 0 && bb.get(n - 1) == 0; n--)
;
long[] words = new long[(n + 7) / 8];
bb.limit(n);
int i = 0;
while (bb.remaining() >= 8)
words[i++] = bb.getLong();
for (int remaining = bb.remaining(), j = 0; j < remaining; j++)
words[i] |= (bb.get() & 0xffL) << (8 * j);
return new BitSet(words);
}
ensureCapacity()函数,代码如下
/**
* Ensures that the BitSet can hold enough words.
* @param wordsRequired the minimum acceptable number of words.
*/
private void ensureCapacity(int wordsRequired) {
//如果words的大小小于所需要的
if (words.length < wordsRequired) {
//扩容最终的大小,最小为原来的两倍
int request = Math.max(2 * words.length, wordsRequired);
//创建新的数组,容量为request,然后将原来的数组拷贝到新的数组中
words = Arrays.copyOf(words, request);
//数组大小不固定的标志
sizeIsSticky = false;
}
}
这个函数用来动态扩容。
flip()函数,用来将指定的位数取反操作,也就是原来位为0,取反后变成1,原来为1,取反后变成0。
set函数,用来将指定的位设置为1。
clear函数,用来将指定的位设置成0。
get函数,用来获取某一位是否有效,或者获取某个BitSet类对象的指定范围的数值到一个新的BitSet类对象中。
nextSetBit()函数,获取从fromIndex开始第一个设置为1的bit索引值。
nextClearBit函数,获取从fromIndex开始第一个设置为0的bit索引值。
previousSetBit函数
将该fromIndex所在的bit以及它的低位设置成1,然后去和相应的words数组元素进行&操作(此时fromIndex所在的bit以及它的低位之外的其他bit都变成了0),如果不为0,说明该words元素上有符合要求的bit位,然后查找的规则就是从高位开始查找,直到遇到第一个bit为1的,就是我们要找的。
cardinality函数,代码如下
/**
* Returns the number of bits set to {@code true} in this {@code BitSet}.
*
* @return the number of bits set to {@code true} in this {@code BitSet}
* @since 1.4
*/
public int cardinality() {
int sum = 0;
for (int i = 0; i < wordsInUse; i++)
sum += Long.bitCount(words[i]);
return sum;
}
用来获取words数组中每个元素bit为1的总bit数。
使用场景
常见的应用是那些需要对海量数据进行一些统计工作的时候,比如日志分析、用户数统计等等。
阿里面试题:有1千万个随机数,随机数的范围在1到1亿之间。现在要求写出一种算法,将1到1亿之间没有在随机数中的数求出来?
public static void main(String[] args){
Random random=new Random();
List<Integer> list=new ArrayList<>();
for(int i=0;i<10000000;i++){
int randomResult=random.nextInt(100000000);
list.add(randomResult);
}
System.out.println("产生的随机数有");
for(int i=0;i<list.size();i++){
System.out.println(list.get(i));
}
BitSet bitSet=new BitSet(100000000);
for(int i=0;i<10000000;i++) {
bitSet.set(list.get(i));
}
System.out.println("0~1亿不在上述随机数中有"+bitSet.cardinality());
for (int i = 0; i < 100000000; i++) {
if(!bitSet.get(i)){
System.out.println(i);
}
}
}
本文暂时没有评论,来添加一个吧(●'◡'●)